108HIVW 


A  BAYESIAN  INTERPRETATION  OP 
DATA  TRIMMING  TO  REMOVE  EXCESS  CLAIMS 

William  S.  Jewell 

University  of  California,  Berkeley 
4  ETH- Zurich 


Abstract 

The  effect  of  excess  or  catastrophic  claims 
is  well  recognized  in  insurance.  For  example, 
in  experience  rating  it  is  customary  to  truncate 
the  data  to  minimize  the  effect  of  such  outliers; 
Gisler  has  recently  proposed  a  credibility  formula 
using  such  data  trimming.  This  paper  develops  a 
model  of  the  excess  claims  process  and  finds  the 
exact  Bayesian  forecast.  The  resulting  forecast 
form  is  approximately  a  data  trim,  thus  justifying 
the  simpler,  heuristic  approach. 


8£  03  09  035 


Zurich,  March,  1981 


'Ms/? 


C  aueoorted  by  the  Air  lores  Office  of  Scientific 

fAMC)  oSJ^under  Great  AF0S1-81-0122.  Reproduction  in 

JET «  SIlSSL  to*  m,  p-i—  of  o»*  »»««>  ««« 

krreraaent. 


MAR  3  1982 


This  document  has  been  appw 
lot  public  release  and  sale:  its 
distribution  is  unlimited. 


The  effect  of  excess  or  catastrophic  claims  is  well 
recognized  in  insurance*  Typically*  one  wishes  not  only  to 
analyze  them  in  detail  to  determine  and*  if  possible* 
correct  their  causes*  but  also  to  modify  the  data  so  as  to 
minimize  their  effect  upon  normal  operating  procedures  of 
the  firm* 

For  example*  in  experience  rating*  data  x  =  (x^,x2,...xn 
collected  from  a  policyholder's  experience  in  years  1*2*... n 
is  used  to  modify  his  premium  for  year  n+1.  If  y  *  xn+i 
is  the  random  variable  denoting  next  year's  total  paid 
claims*  the  fair  premium  will  be  just  the  regression  of  y 
on  the  data  &*  or  €(y|x)#  In  credibility  theory*  it  is 
assumed  that  this  forecast  is  linear  in  the  data*  giving 
the  well-known  formula: 

(l.D  C(yls)  z  f(s)  =  (i-zn)m  +  zn(£|St)  . 

Here  m  is  the  "manual”  (fair*  no-data)  premir-*  and 

Z_  ■  n/(n+H)  is  the  credibility  factor  with  time  constant 
n 

N  determined  empirically  or  from  a  Bayesian  model  (see* 
e.g. ,  Norberg(1979)  for  further  details). 

The  effect  of  an  excess  claim  upon  experience  rating 
is  obvious  from  (1.1).  Vhat  one  would  like  to  do  is  to 
detect  and  remove  this  claim  from  the  data*  and  spread 
all  or  a  portion  of  the  excess  amount  over  all  the  policy¬ 
holders*  perhaps  by  charging  it  against  a  special  reserve. 
However*  in  many  situations  it  is  not  possible  or  economical 
to  use  qualitative  information  about  the  olaim  to  decide 
if  it  is  of  ordinary  or  excess  type*  and  one  must  use  a 
numerical  procedure  to  "cleanse"  the  data  before  using  (1.1).  • 
Based  upon  heuristio  methods  used  in  industry*  A.  dialer 
(1980)  proposed  to  replace  (1.1)  by: 


I 


(1.2) 


f(i)  »  a  b 


where  the  parameters  (a,b,M)  are  adjusted  so  as  to 
minimize  the  mean-squared  error  in  the  forecast;  the 
result  could  be  called  a  data-trimmed  credibility  formula. 

esian  Model  for  Outliers 


Ve  now  develop  a  model  which  describes  how  excess 
claims  arise,  and  then  find  the  exact  Bayesian  prediction 
formula.  By  comparing  this  with  Gisler's  form  (1.2) ,  we 
will  be  able  to  provide  additional  motivation  for  the 
trimming  procedure. 

First  of  all.  we  assume  that  an  ordinary  claim 
random  variable.  xQ.  has  a  known  density.  po(xQ|e), 
depending  upon  an  unknown  parameter  e  which  characterizes 
the  different  policyholders  and  their  exposure  characteristics 
The  first  two  moments  of  this  random  variable  are: 


(2.1) 


«n0(*)  »  6(x0|e) 


▼0(e)  -V(xQ|e)  . 


In  the  usual  experience  rating  model,  we  are  given  several 
independent  observations  of  the  xQ  type  from  a  policy 
with  fixed,  but  unknown,  e.  and  we  wish  to  estimate  the 
mean  of  the  next  observation  from  the  same  policy.  Th4 
is  equivalent  to  estimating  mo(e),  given  a  prior  density 
on  e.  and  the  data  £  .  If  it  is  known  that  all  data  is 
of  ordinary  type,  then  in  many  eases  the  credibility  forecast 
(1.1)  is  exact,  or  a  good  approximation. 

Now,  however,  suppose  that  it  is  occasionally  possible 
that  we  observe  instead  an  excess  claim  random  variable, 
x#,  with  density  p#(x#)  not  depending  upon  e  (although 
this  can  be  easily  generalized,  if  desired).  This  excess 


-3- 


claim  is  considered  to  be  the  result  of  some  extraordinary 
cause ,  so  that  the  density  p  will  have  large  mean  and 

O 

variance  compared  with  every  density  pQ.  We  also  assume 
that  there  is  no  qualitative  way  in  which  one  can  identify 
an  excess  claim  as  such;  thus,  the  densities  should  have 
overlapping  ranges,  otherwise,  there  would  be  no  difficulty 
in  separating  excess  claims  based  upon  their  magnitude. 

We  continue  to  let  x  =  (x^jXj, . . .xn)  represent  the 
observational  data,  assumed  independent,  given  ».  But 
the  observation  random  variable,  x^,  (t=l,2, . . .n) ,  is 
now  sometimes  an  ordinary,  sometimes  an  excess  random 
variable,  and  we  assume  that  there  is  a  known  contamination 
probability. T T .  that  independently  selects  if  an  ordinary 
claim  is  replaced  by  an  excess  claim.  In  other  words,  we 
assume  that  the  individual  observations  follow  the  mixed 
density: 


(2.2)  p(xtl  e)  »  (l-fT)p0(xt|e)  +fTpe(xt)  , 

so  that  the  likelihood  of  £»  * 


(2.3)  P(jl*)  *  U  p(xj  e)  , 

t-1  t 

consists  of  2n  terms.  Since  IT  is  small,  however,  only  the 
first  few  terms  will  generally  be  significant  (e.g.,  there 
are  usually  only  no,  one,  or  a  few  excess  claims  in  any 
small  sample). 

As  in  other  experience  rating  models,  we  assume  that 
we  are  given  a  prior  density,  p(e),  on  the  unknown  parameter, 
so  that  Bayes'  law  then  gives  a  posterior- to-data  density 
for  the  unknown  parameter  of: 


(2.4) 


,  ,  .  P(*l*)  ?(•) 

»(.!*>  ■  - 


•From  this  point  on,  we  are  using  the  usual  Bayesian 
trick  of  using  p(.)  for  several  different  densities,  and 
letting  the  variables  "speak  for  themselves". 


where  p(x)  is  the  integral  of  the  numerator  over  #.  Now, 
however,  we  must  remember  that  we  are  not  interested  in 
predicting  just  the  next  observation,  but  rather  in 
predicting  the  next  observation,  given  that  it  is  of  ordinary 
M;  this  random  variable,  call  it  y  ,  has  a  density  p  (y  |e) 

O  0  0 

if  •  were  known.  It  follows  then  that,  given  the  data  x, 
we  can  form  the  Bayesian  predictive  density  of  yQ  from  (2.4) 
as  follows: 

(2*5)  P(70U)  *  Jp0(y0l»)  p(s|x)  de  . 

The  exact  Bayesian  mean  predictor  of  yQ  is  then  just  the 
first  moment: 

(2-6)  £(y0|i)  *  f<s)  *  p(*ix)  do 

To  better  understand  how  this  formula  depends  upon  the  data, 
we  need  to  develop  further  the  likelihood  (2.3). 


3.  Single  Observation  Case 

First,  suppose  that  only  n=l  observation  has  been 
made.  Then  (2,3)  has  only  two  terms,  and  the  exact  forecast 
is: 


(3.1)  f(x1) 


P0(xxl  •)  p(s)  de 
♦trPe(xx)Jo0(*)  p(e)  de 


f 


where 


(3.2)  p(xx)  -  (l-1T)p0(x1)  ♦'TTp#(x1)  J  p0(xx)  »  Jp0(x1le)p(e)de 
The  second  integral  in  (3.1)  is  just  the  a  priori  expected 


value  of  yQ  (the  manual  premium): 


(3.3) 


®0  =  £y0  # 


which  would  be  the  "forecast"  If  no  data  were  available. 

The  first  Integral  In  (3.1)  la  more  Interesting,  as 
it  is  related  to  the  Bayesian  prediction  in  which  it  is  known 
that  the  observation  is  of  ordinary  type.  In  contrast  to 
f(x^),  which  is  the  prediction  from  an  arbitrary  observation 
following  (2.2),  we  can  define  fQ(x^)  as  the  ordinary 
observation  prediction,  gotten  from  (2.6)  by  setting  1T*0: 

.  „  f  p  (x,|e)  p(e) 

(3.4)  S(y0l  ordinary)  =  f^x^  -  J  m0(^~2 — p  - d»  • 


This  could,  of  course,  follow  the  linear  credibility  law  (1.1) 
Finally,  we  rewrite  the  exact  forecast  as: 

(3.5)  f^)  -  5T^r[(1'1')P°<Il,f°(ll)  +1r!’.(xl)l“o]  , 

which  can  be  rewritten  in  two  revealing  forms:  the  first, 


(3.6) 


with 


(3.7) 


fUx) 


1  *  rf(*x) 


it  *  /IT  \Pelfll 

*(xl}  *  (l-’Ttlp0(x^) 


as  an  "odds-likelihood-ratio" ;  the  second  in  a  credibility 
format: 


(3.8) 


f(xx)  ■  [l-Z(x1)]m{)  +  Z(xx)f  (x,)  , 


-6' 


with  a  new  data-dependent  credibility  factor: 


(3.9)  Z(x^)  =  [l**^)] 


•: 1 


(l-TT)p0(x1) 

(i-ir)p0(x1)-Hrrpe(x1) 


Z(x^)  is  essentially  the  a  posteriori  probability  that  the 
observation  x^  is  ordinary. 

In  the  usual  situation,  the  averaged  ordinary  density 
pQ(x^)  and  the  excess  density  p#(x^)  might  appear  as  in 
Figure  1,  giving  then  the  weighting  functions  ^(x^)  or 
Z(x^)  shown  in  Figure  2. 


4.  Comparison  v^-fch  Trimming  in  the  Credibility  Case 

As  discussed  in  the  first  Section,  it  is  often  the 
case  that  fQ(j)  is  linear  in  the  data  x,  i.e.  it  follows 
(1.1),  with  m  replaced  by  mQ,  and  the  ordinary  (non-data- 
dependent)  credibility  factors  ZQ  replaced  by: 


(4.1) 


Thus,  in  the  one-dimensional  case,  f0(x^)  in  (3.8)  would 
be  replaced  by: 


(4.2)  t0U -  o0  +  Z0i(*!-»0)  • 

This  means  that  the  exaot  Bayesian  forecast  would  have 
the  interesting  shape  shown  in  Figure  3. 

This  shape  shows  us  that,  if  x^  is  small,  we  believe 
it  is  of  ordinary  type,  and  we  should  experience  rate 
according  to  the  linear  law  (4.2).  But  as  x^  increases 
beyond  mQ  into  the  region  where  the  odds-likelihood-ratio 
becomes  significant,  we  begin  to  hedge  our  bets  on  the 
fact  that  we  have  an  ordinary  observation,  and  to  reduce 
the  dependence  of  the  forecast  on  x^.  Finally,  for  x^ 


Figure  1 


(xl) 


Figure  3 


very  large,  an  excess  observation  is  highly  credible,  and 
we  settle  for  the  "no-information"  manual  rating,  m©. 

We  see  that  the  resulting  forecast  is  quite  similar 
to  that  obtained  by  ordinary  credibility  theory,  but 
trimming  the  data  and  replacing  x^  by  min(x^,M)  as  in  (1.2) 
Although  sharp  trimming  will  not  have  the  "bump"  shown  in 
Figure  3»  the  effect  will  be  small  because  the  three 
parameters  (a,b,M)  in  (1.2)  can  be  adjusted  to  minimize 
the  mean-squared  error,  thus  giving  a  straight-line  portion 
to  the  forecast  which  is  slightly  different  than  (4.2). 

Another  point  in  favor  of  the  trimming  is  that  it 
might  be  difficult  to  implement  the  exact  predictive  form 
in  Figure  3  in  a  real  experience-rating  scheme;  it  would 
be  difficult  to  explain  a  plan  in  which  a  policy  with  a 
larger  claim  might  have  a  smaller  next  year's  premium  ! 


5.  A  Numerical  Example 

Figure  4  shows  a  numerical  example  in  which  normal 
densities  were  chosen  for  the  average  ordinary  and  excess 
densities;  the  means  and  standard  deviations  were: 

m©  *  j±Q  =  10  ;  r  *  5  ;  /*.©  ■  50  ;  r  *  20  . 

A  contamination  probability  f  »  0,1  and  a  credibility 
factor  Z©^  *0.5  were  used,  so 

f©(xx)  -  10  +  0,5 (x^-10)  . 

The  resulting  exact  forecast  f(x^)  is  plotted  in  Figure 
4,  together  with  the  optimal  trimmed  fore cast, which  is 
approximately 

f(xx)  *  10  +  0.44l[min(x1,14.7)  -  10] 

Note  that  in  the  use  of  Oisler's  results,  one  must 
subtract  4T/i©  from  his  forecast,  as  he  does  not  have  an 
explicit  model  for  the  generation  of  excess  claims,  and 
is  predicting  a  future  observation  of  either  type. 


f(*j) 


Exact  Bayesian 


Trimmed\Data 


Figure  4.  Exact  Bayesian  and  Trimmed  Data  Forecasts  for  Example. 


6.  General  Case 

From  the  proceeding,  it  should  be  clear  that  the 
optimal  predictor  for  an  arbitrary  number  of  observations  n 
consists  of  2n  terms  from  (2.2) (2.3),  corresponding  to  all 
the  different  ways  in  which  the  data  £*(x^,X2,...xn)  can  be 
partitioned  into  ordinary  or  excess  categories.  The  formulae 
are  greatly  simplified  in  the  general  case  if  we  use  set- 
theoretic  notation. 

Let  71  «  {l,2,.,.n},  be  any  subset  of  ft  (including  ft 
and  the  empty  set  0),  ft  - 1- ,  and  J  *  Q$fl  .  Use  also  ^ 
as  a  subscript  to  denote  an  arbitrary  subset  of  observables , 
so  that,  for  example,  x^  »  Then  the  probability 

that  this  subset  is  all  ordinary  is: 


(6.1) 


»o(x«> 


frj  p0(*4l*)  p(*>  d#  » 


whereas  the  probability  that  it  is  sill  excess  is: 


(6.2) 


*e(x 


=  i r 


For  consistency  in  the  following  equations,  set  Po(*0)  “ 
Pe(x0)  =  1* 

Next,  we  define  f  (xa)  to  be  the  Bayesian  forecast 
of  an  ordinary  random  variable,  yQ,  using  only  the  data  x^  , 
assumed  to  be  all  of  ordinary  type.  This  might  be  the 
J-term  generalization  of  (4.2),  e.g.,  (2.1)  with  m  replaced 
by  mQ,  ZQ  replaced  by  ZqJ,  and  of  course  using  only  the 
data  x^  .  For  consistency,  the  no-data  forecast  is 

fo(x0)  "  V 

Then,  examination  of  the  expansion  of  (2.3)  shows 
that  the  forecast  consists  of  the  weighted  sum  of  2n 
forecasts: 


(6.3)  f(i)  =  £  Z.(x)  fQ(x|)  , 
where  the  data-dependent  credibility  factors  are: 

(6.4)  Z^(S)  =  X  (1-1»V4  pQ(y  Pe(xj)  , 

and  X  is  adjusted  so  that  the  factors  sum  to  unity. 

The  sum  in  (6.3)  is  over  all  2n  subsets  of  U  .although, 
as  previously  stated,  it  is  unlikely  that  more  than  a  few 
excess  claims  would  be  present  with  IT  small.  This  suggests 
the  following  computational  strategy:  arrange  the  data 
in  decreasing  magnitude,  Z^>X2>» • take  ^  successively 
to  be  0,  {l} ,  {l,2^,  ...etc,, and  compute  the  corresponding 
credibility  factors  Z|  .  At  some  point  these  factors  will 
become  quite  small,  and  the  remaining  terms  in  (6.3)  can  be 
neglected.  One  can,  if  desired,  bound  the  neglected  terms. 


-11- 


7.  Continuing  Research 

The  results  presented  here  are  part  of  a  continuing 
research  effort,  Joint  with  H.  Btlhlmann  and  A.  Gisler. 
Current  effort  is  devoted  to  multi-dimensional  computations, 
and  comparison  of  trimmed-data  forecasts  with  the  exact 
Bayesian  prediction.  Preliminary  results  indicate  that 
the  Gisler  approximation  continues  to  be  quite  good  in  the 
multi-dimensional  case.  These  and  other  results  will  be 
be  presented  in  an  expanded  version  of  this  paper  later 
this  year. 


References 

1.  A.  Gisler,  "Optimum  Trimming  of  Data  in  the  Credibility 
Model",  Bulletin  of  the  Society  of  Swiss  Actuaries. 
I960-.  Heft  3,  313-325  (1980). 

2.  R.  Norberg,  "The  Credibility  Approach  to  Experience 
Rating",  Scandinavian  Actuarial  Journal.  1979. Mo.  4, 
181-221  (1979). 


